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1. Introduction 

Consider a standard hidden Markov model {X,Y), where X = (X„)„gz+ and Y = 
(y„)„gz+ are the hidden state and the observation processes respectively. The state pro- 
cess X is Markov with values in a subset 5 C R, transition probability Q and initial 
distribution A^: for all measurable subsets AQ S, 

P(Xi e A)=M{A) 

P(X„G A|X„_i) = Q(X„_i,A), P-a.s. n>l. 

We shall consider either countable S, in which case q{u,v) := Q{u,{v}) and fJ-{u) := 
M{{u}), or 5 = M, assuming that Q{u,dv) and M{du) have densities q{u,v) and fi{u) 
with respect to the Lebesgue measure. The precise meaning of q{u,v) and fjL{u) should 
be obvious from the context. 

The observed process Y forms a sequence of conditionally independent random vari- 
ables, given A"i.oo = (^1,^25 with the observation density p: 



PiY„ e B\Xi:^) = [ p{Xn,y)dy, 

J B 



for any Borel SCR. 

The path estimation problem is to reconstruct the trajectory of the hidden process ^ 
Xi-n = {Xi, Xn), given the realization of li:„ = {Yi, Yn) for a fixed horizon 71 > 1. 



^hereafter, for x G R", Xm stands for the m-th entry of x and Xi^.^, k < m denotes the vector 
= (xfe, ...,x,n); \xi;„\ = max; \xi\ and = ^Z^iLi ■ 
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If 5 is a discrete set, a natural estimator is the maximizer of the a posterior probabiUty 
(MAP estimator): 

X"^.^ := argmax^^^^g:5„P(Xi:„ = .Ti:„|yi:„), 

where the optimal path is chosen according to the lexicographical order on 5", induced 
by an order on S, whenever the maximum is not unique. The obtained path minimizes 
the probability of error among all estimators depending on Yim, that is 

PiX[':n ^ ^i:n) < P(Ci:n ^ ^i:„), for all g{Y^, F,, j-mcasurablc ei:„. 
By the Bayes formula, 

where Ln is the "posterior" hkchhood: 

n 

Ln{^X\:n'-, yi:n ) = ^^{xl)p{xl,yl) q{x )p{Xm,ym), Xl:„eS", (1.1) 

m— 2 

and hence 

= argmax^^^^g5„L„(a;i:„, Yi:„). 

Due to the product structure of L„, the search for the maximizing path can be carried 
out efficiently by a dynamic programming procedure, called the Viterbi algorithm after 
A. Viterbi, who introduced it in the context of error correction codes. 

When the next observation Yn+i is added, the optimal path may change entirely, i.e., 
for any to = 1, n, X"'^ is in general different from X"^. In practical terms, the latter 
means that^ #5 optimal pathes candidates of length n are to be kept in memory at each 
time n. This motivates the question of whether the optimal path stabilizes as the number 
of observations grows to infinity, or more precisely, whether the limit 

Xi:,„ = lim Xl\^ (1.2) 

exists P-a.s. for each fixed to > 1. If such a limit exists, it defines a random process with 
pathes in 5°°, coined in Lember and Koloydenko (2010) the Viterbi process. 

An affirmative answer to this question was given in Caliebe and Rosier (2002) (see 
also Kogan (1996)) under a sufficient condition (see (2.1) below), which also ensures that 
the limit sequence X = {Xm)m>i is a regenerative process. More precisely, a sequence 
of stopping times can be constructed (see Caliebe (2006)), splitting the process X into 
cycles that are i.i.d. and independent of the initial delay. In particular, by the regenerative 
property, X satisfies the classical limit laws, such as LLN and CLT. 



stands for cardinality of a set A 
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In fact, the existence of such renewal times under the condition (2.1) can be deduced 
by a simple argument (replicated for completeness in Section 2). A more delicate con- 
struction in Lember and Koloydenko (2008, 2010) verifies (1.2) under conditions, weaker 
than (2.1). 

In this paper, we revisit the question of existence of the limit (1.2) for the hidden 
Markov models (HMMs) with continuous state space, i.e. when 5 = M and for each u S K, 
the transition kernel Q{u,dv) and the initial distribution Ai{dv) have densities q{u,v) 
and fi{v) with respect to the Lebesguc measure. By the Bayes formula, the conditional law 
of the vector Xi-^, given Yi.„ has the density ipn with respect to the Lebesgue measure 
on K": 

^ ._ Ln{xi.,n; Yi.,n) 

with Ln defined as in (1.1). The MAP path estimator is 

^i:n — argmax^^^gj{"V'«(a;i:«) = argmax^^^gK„L„(xi:„; yi:„), 

where as in (1.2), the maximum is chosen according to the lexicographical order on M" 
(induced e.g. by < on R) in case of ambiguity. 

Note that for any cr{Yi, yri}-measurablc random vector ^i:„ and e > 

F(|Xl:„-ei:„| <£) -]EP(|Xl,„-Cl:n| <e|yi:„) = E / ijn{xi:n + il:n)dXl...dXn 

and hence the estimator Xi.^ is optimal in the sense: 

hm £-"P(|Xl:„ - a:n| < e) = ^M^l-.n) < 

E max ^n{xi,n) = lim £-"P(|Xi:„ - X^J < e), 

whenever interchanging the expectation and the limit is possible. Roughly this means 
that X".^ yields the best "small" credible intervals among all other path estimates'^. 

As in the state estimation problems such as filtering, the exact calculation of Ar"„ is 
impossible beyond a number of models with a special structure, most notably, Kalman's 
linear Gaussian setting. A number of efficient numerical techniques, such as particle 
filters, have been developed (see e.g. Cappe, Moulines and Ryden (2005)) to approximate 
the conditional law of the hidden state process. In this paper we are concerned with the 
convergence properties of the MAP paths, leaving out the computational issues for further 
investigation. 

In Section 2 we explore through a number of examples, various patterns of convergence 
encountered in (1.2), when the hidden state space is continuous. We also give an example 
of HMM, for which the MAP path does not converge as the estimation time horizon 
increases. In Section 3 we prove a more general result, deducing the existence of the 

■^In fact, this optimality interpretation turns out to be meaningful even in the infinite dimensional 
function space, see Zeitouni and Dembo (1987, 1988). 
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limit (1.2) from certain strong log-concavity of the transition and observation densities. 
Appendix A contains a lemma which is used in the proof of the main result and might 
be of interest on its own. Finally, a short discussion of the results appears in Section 4. 

2. Examples 

Let us briefly recall the essential elements of the proof in the finite setting S = {1, d}. 
For simplicity consider an irreducible finite (and thus recurrent) chain X and define 

A = {y e R : qixi,i)p(i,y)q{i,X3) > a;2)p(a;2, ?/)g(x2, Xa), Vx2 ^ i,Xi,X3 G S}. 

Suppose that for a pair of states jo and io, 

/ pUo,y)dy>0- (2.1) 

Recall the definition of L„ in (1.1) and notice that on the event Am = {Xm = jo, Ym S 
Dio} with a fixed m > 1 and all n > m 

Ln{xi:n, ^l-.n) 

m—l{x\-m—l-i ^l:m — l) Q{Xm—l ; -^m )p{,Xm j ^m^Q^X^ji , ^m+l )-^m+l,n ('^(m+l);n j :n) 
!i Lm-l{xi;,n-l, Yl:m-l)(l{Xm-li *o)p(*Oi ^m)'?(*Oi Xm+l)Lm+l^n{X(m+l):m Y(m+l):n)i 

for an appropriate function L^+i-.n and where the equality is attained only at a path 
xiirn with Xm ~ io- Hcncc the m-th entry of the optimal path must equal iq for any 
n> m, i.e. X^ = io. But then, given X^, the first m entries of the optimal path depend 
only on the values of Yi, Ym and are not affected by Yfc, fc > m. Hence the limit (1.2) 
exists on the event Am- Since the chain {X,Y) is recurrent, for any fixed m, one of the 
events Am+i, Am+2, ■■■ occurs P-a.s. and thus (1.2) holds P-a.s. 

Following the same basic idea, let r(fc), fc > be the times at which the chain {X,Y) 
revisits the set {jo} x Dig-. 

r(0) - 1 

T(fc) = inf{n > r(fc - 1) : X„ - jo, G Ao}, k > 1 
By construction, for any k, on the event {T(fc) < n}, 

L{xi;n;Yi.,n) < L{xl.„; yi:„), \fxi.,n G 5", 

where is the vector which coincides with xi-„ at all but the indices T(l),...,T(fc), 
where its entries equal ig. 

The upper bound is attained if L{xi;n',Yi-„) is maximized over xi-n, constrained to 
Xt(i) = ... = XT(k) = io- Since each Xt-(£), i = 1, k appears in the product L(xi-n] Yi-n) 
at three adjacent terms, the optimal choice of each segment x^■(l-l)+l■.T{^)-l^ ^ = Ij k 
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is determined only by the values of Yt.(^(_i^^i, ...,Yt.(^i-^_i. Hence, in particular, the limit 
lim„^oo exists on any of the events {T(fc — 1) < m < T{k) < oo}, fc > 1. By 

recurrence of jo and the condition (2.1), P(T(fc) < oo) = 1 and limfe_j,oo T{k) ~ oo, P-a.s., 
which verifies the existence of the limit (1.2). 

The stopping times T{k),k > 1 form a renewal process, with respect to which both 
{X,Y) and X = {Xm.)m>i are regenerative (see Caliche (2006) for more details). As 
pointed out in Lember and Koloydenko (2008) the condition (2.1) can be quite restric- 
tive, especially when the transition matrix is sparse. The convergence in (1.2) and the 
regenerative property arc verified in Lember and Koloydenko (2008) under less conser- 
vative conditions, using a more sophisticated construction of the renewal times. 

In summary, both Caliebe and Rosier (2002) and Lember and Koloydenko (2008) de- 
duce the existence of the limit in (1.2) from the explicit construction of stopping times, 
based on the discreteness of the hidden process state space. The following example shows 
that this still may be possible in HMMs with continuous state space. 

Example 2.1. Consider a linear HMM with Laplacian state and Gaussian observation 
noises: 

M«) = ie-l«l/^ <z(«,t.) = Je-I"-I/^ p(x,z/) = -i=e-(-^)^/^ 
In this case the MAP path is given by 

n 

X^.„ = argmin^^^^g][j„ (^\xi \ + (xi - Yi)^ + ^ \xrr,-i - x„t \ + {x^ - ^m)^) ■ 

Consider the function x f{x) := \a — x\ + {x — y)^ + \x — b\ for fixed a,b,y G M. 
Suppose w.l.o.g. a < b and note that /, being strictly convex, is minimized at a unique 
point a;* = argmin^g]g/(x). If y G [a, b], then clearly, x* € [a, b] and since on this interval 
f{x) = —a + (y — x)'^ + b, we have x* — y. Consider the case y < a and suppose x* < a. 
For X < a, f{x) ^ a — x + {y ~ x)^ + b ~ x and hence x* = y -I- 1. By strict convexity, 
this implies that x*=y + l, if y<a — 1 and that x* > a, otherwise. Clearly, x* < b, 
i.e. X* € [a,b], which in turn implies that x* ~ a for y G [a — I, a). Similar calculations 
reveal that a;*=y-l,ify>5-|-l and x* = b if y e {b,b + 1]. 

To summarize, x* e [y — 1, y + 1] for any a, 6, y G K and x* ~ y, whenever a < y < b. 
In particular, X:;^_i £ [Ym-i - l,Fm-i + 1] and X^^+i € [Y,n+i - l,Y,n+i + 1] for any 
n > m + 1. Hence on the event 

A-m {Ym-1 + 1 < Ym < Ym+1 — 1}, 

Ym e and consequently X," = Ym- This in turn implies ^f.^ = X^^^ for 

all n > m -f 1 and the existence of the limit (1.2) on any of ^fc, k > m -\- 1. Clearly A^s 
occur infinitely often and hence, as in the discrete case, X",„ ceases to change starting 
from some random but P-a.s. finite time n. In particular, (1.2) holds P-a.s. ■ 
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However, splitting the optimal trajectory into umelated segments is not the only way 
to get the convergence in (1.2): the following example shows that the limit may exist 
without ever being actually attained. 

Example 2.2. Consider the linear Gaussian HMM with 

V ^TT V ^TT V ^TT 



In this case the conditional law of Xi:„, given Yi-n is Gaussian and hence 

^l:n = ]E(Xi:„| Yl:„) . 

For any fixed to > 1, the process Xp,„ = E(Xi:„i|Yi:„), n > to is a uniformly integrable 
vector valued martingale and hence the limit (1.2) exists by the martingale convergence. 
In fact, the Kalman linear filtering theory (see e.g. Kwakernaak and Sivan (1972)) tells 
that in this case (of controllable and observable dynamics) the stronger P-a.s. exponential 
convergence holds (see also Remark 3.2 below). 

Moreover, E(Xi,m|yi:„) is a deterministic linear map of Yi-n and a calculation reveals 
that it actually depends on each one of the components in Yi:„. Since Yi:„ is a non- 
degenerate Gaussian vector. 



^[X"^ = , for some j < m ) = 

for any n' > n > m. 



Finally, the next example demonstrates that a finite limit in (1.2) may not exist, even 
when the hidden state chain is positive recurrent and has countably many states. In fact, 
it also shows that the optimal MAP path may not be an adequate estimate: in this case, a 
trajectory of a positive recurrent chain V is estimated as a constant trajectory, diverging 
to infinity, as n — > oo. 

Example 2.3. Consider the HMM with the hidden state process X„ = (C/„, Vn), con- 
sisting of independent components U and V. The process U = {Un)n>i is a sequence of 
i.i.d. random variables uniformly distributed over [0, 1]. 

y = iVn)n>i is a random walk on positive integers with reflecting boundary at {1} 
and the transition probabilities P(l,l) = 1 — e, P(l,2) = e and for i > 2, 

i + l 

z-l (2.2) 

i 



(iTir 



P{i,j) 



■i+(itt) 

^ — / ^ \i i 

i+(itt) 
1 - e j 
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where e > is a small fixed constant (in fact, we shall choose e < e ^/(1 + e ^) = 0.119... 
later on). is a positive recurrent Markov chain with the unique invariant distribution 

<!) = { V V \2\ (2.3) 

where C is the normalization constant, independent of e. We shall assume that V is 
stationary, i.e. it is started from Vi ~ tt. Stationarity is not really required in what 
follows and is solely a matter of aesthetics (e.g. P(Vi = j) = C / p will work as well). 

Let flo = 0, a,j = 8 ^^^-^(1/9)-', i = 1,2,... and set Ai — [ai^i.ai], i > 1. Denote by 
ti = 8(1/9)* the length of the interval Ai and note that [0, 1) = ^°^iAi. 

Now consider the observation density 

V 

As we show below, the MAP estimates of J7i:„ and Vi-n are given by^: 



yn 



2 j*{n)^l 
r{n) r{n)>l 



(2.4) 



<;|j : J2k=i -^{YkeAj} > o|. Since all Aj^a have positive Lebesgue 
3 as n — >■ cxj, and consequently, for any fixed m > 1, 



where j*{n) max- 
measure, j*{n) cxj as n 

lim 1/^ = lim j*{n) = oo, P - a.s. 



Before proving (2.4), we shall briefly explain why the optimal path of such a form 
should be anticipated. Note that since UiS are uniformly distributed in [0, 1], the choice of 
t/"'s influences the likelihood (1.1) only through the observation densities. More precisely, 
whenever {Y,n. G Ai\ is observed, the maximal gain of is obtained if IJ^ € Ai and 
Vm — * ^^"^ chosen. On the other hand, the transition probabilities of (2.2) favor pathos 
V^^ without jumps. Hence the optimal path should be constant and large enough 
to allow the access to the narrowest Ai visited by Fm's so far, i.e. greater or equal to 
j*{n). But if constant V{^^ is chosen, it cannot be too large, as this would decrease the 
likelihood through the term 7r(l/"), due to the fast tail decay of the initial distribution tt. 
This heuristics is implemented by an appropriate balancing between all the ingredients 
of the model. 



* The choice of C7^ is not unique, unless the lexicographic order is imposed: e.g. C/^ := Ym yields 
the same value of the likelihood. 
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Let's first check (2.4) in the case j*{n) > 1. To this end, consider the ratio 

L„{{Ui;n,Vl:n),Yi.,n) _ n{vi) -A- P(vrn-l,VTn) A p{{Um , Vm) ,Ym) 

for an arbitrary iti:„ and vi-.n- Let N be the number of jumps in and v*{n) = 
maxk=i,. ..,nVk- Note that P{vm-i,Vm) = 1 - £, when Vm-i = Vm and P{vm-i,Vm) < £ 
otherwise. Hence, as P{j*{n,),j*{n)^ = 1 — e, 

-A- P{vm-l,Vm) ^ / £ ^ 

Further, note that on the event G A,}, p((u,„, w,„), y,„) < 1 V = and 

p((«^™,J*W),>^m) =^r' and thus 

p{{Um,Vm),Ym) ^ _^ 



p((t/»,j*(n)),r„ 
Moreover, on {Y,n € ^j*(n)}, 



{{Um,Vm),Ym) ^ ^{t'* (") <3 *(»)}+ ^j»^(n) ^ {"•' (")>3-' (»)} ^ -^D-VjAj* (ri) 



Plugging these inequalities into (2.5), we get: 



7r(vi) .Ar7r(u*(n)) «„.(„)Aj.(„) 



(2.6) 



^{v*{n)) 7r{j*{n)) e-\,^^ 
where e := e/(l — e) is set for brevity. Since N > v*{n) — vi, 

^K) gjv < 7r(t>i) ^^»(„),^^ 
7r(u*(n)) ~ 7T{v*{n)) 

1 



u*(n) + l 

where in the second inequality we used the expression for 7r(j), j > 1 from (2.3). In fact, 
the inequality is true also for v*(n) = v\ = 1, as both the right and the left hand side 

turn to 1, and for v*(n) > wi = 1, as 7r(l) is less than C^{\ + (^j+r) evaluated at 

3 := 1. 
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The function x i— >■ x^e^ attains its maximum at x* = 2/ loge~^ and is strictly decreas- 
ing on (a;*,oo). Hence with e < e~^, that is with e < e^^/(l + e^^), for any y > x > 1, 
{y/x)^ey~^ < 1 and hence 



, , < 1- (2.7) 

The equahty holds if and only if v\;n is a constant path, i.e. Vm ~ v*{n) for all m — 1, n. 
Further, if v*{n) < j*{n), 

/J— 1 2 ^ -\- ( {^) \^ 



<J*{n)) I-} -Win) J 



(2., 



where the latter inequality holds since 1/9 < e~^/(l + e^^). 

The sequence 7r(j) attains its unique maximum at j :— 2 and is strictly decreasing for 
j > 2. Hence, if v*{n) > j*{n) > 2, 

Plugging (2.7) and (2.8) into (2.6), yields the inequality for any ui:„ and vi-n 

which saturates if and only if u,„ = j*{n), m = 1, n, thus verifying optimality of (2.4) 
on the event {fin) > 1}. 

We shall omit the details in the case {j*{n) = 1}, which is treated similarly: the 
optimal value = 2 is obtained, since 7r(j) is maximal at j = 2. Of course, as j*{n) 
eventually leaves the state 1, the exact value is irrelevant for the main point of the present 
example, that is the divergence lim„_+oo Vm — 



3. Convergence in the case of log-concave densities 

In this section wc establish the existence of the limit (1.2), deducing it from certain strong 
log-concavity properties of the densities q and p. Hereafter the following assumptions are 
in force: 

al. the initial state density /i is a C^(M) log-concave function on R and — log/i(u) > 0; 
a2. the hidden state transition density g is a C^(IR^) log-concave function, namely ^ 

q{u,v) (X e""*-"'"-*, where a{u,v) is a nonnegative twice continuously differentiable 

convex function on ; 

^focg means that f/g is constant 
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a3. the observation density p is C^(R) log-concave function in the first argument: 
p{x,y) oc e~'^^^''^\ where for each y G M. the function x i— > ^{x,y) is nonneg- 
ative, twice continuously differentiable and strongly convex on R with a;*(?/) := 
argmin^gg7(a;, y) S (—00, 00) and 

-Q^l{x,y) > K > 0, Vx, yeM, 

with a constant k. 
a4. for some constant C, 

- li^ -logL„(Xi:„,Fi:„) < C, F-a.s. 

n— >oo 77, 

a5. there is a non-decreasing function g : i— > M_|_, growing to +00 not faster than a 
polynomial, such that for all M > 

a{x,y)<M =^ ——a{x,y)<giM),\fx,yeR. 
oxoy 

Remark 3.1. The log-concavity assumptions (al)-(a3) are quite restrictive. For exam- 
ple, if Yn = h{Xn) + Wn with Wn ~ ^^"(0, 1), then 

^7(^^,2/) = l-^{y h{x)f = {h\x)f - (y - h{x))h"{x), 

which typically will not admit the uniform lower bound of (a3), unless h is linear, i.e 
h"{x) = 0. 

If the assumption (a3) is satisfied, it implies 7*(?/) :— ^{x*{y),y) G (—00,00), for all 
y S R and, moreover, 

l{x,y)-"f,{y)>'^n{x-x,f, Va;, y € M, (3.1) 

which is essential to our approach. 

Assuming that — log ^(u), a{u, v) and "f{x, y) are nonnegative is equivalent to assuming 
that they arc lower bounded by a constant, i.e. that the corresponding densities are 
bounded. 

The assumption (a4) is typically satisfied, if the state process X is positively re- 
current (explicit recurrence tests can be found in Mcyn and Tweedie (2009); see also 
Genon-Catalot, Jeantheau and Laredo (2000)). Finally, (a5) is a technical assumption, 
which is satisfied in most models of practical interest. 

Example 3.1. All the above assumptions are satisfied for the linear HMM 

Xn = aXn-1 +Vn, n>l 
Yn = bXn + Wn, 
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where \a\ < 1 and 6 7^ are constants and v = {vn)n>i and w =■ (w,i)n>i are independent 
sequences of i.i.d random variables with 

^o,Wn /t-(a;) oc e~l^l'^' and u;„ - /^(x) cx e-^'^^+^^l^l' ^ 

for some 5 > and 5' > 0, c > 0. ■ 

Theorem 3.1. The limit in (1.2) exists P- a. s. 

Proof. To keep the notations simple, we shall prove the convergence in (1.2) for m = 1, 
i.e. the limit lim„^oo ^\ exists P-a.s. As will be clear from the proof below, the same 
arguments imply convergence lim„^oo X" for any i < m and hence of (1.2) for any fixed 
m > 1. 

To check lim„_i.oo X^, P-a.s., we shall show that on a set of probability one the series 

n 

= xi+Y^ {x'^ - x'^-^) 

1=2 

is convergent. The proof hinges on the system of inequalities (3.6) and (3.7), which stem 
from the log-concavity properties assumed in (al)-(a3). A pigeonhole principle type of 
argument (Lemma A.l) shows that a sequence satisfying such inequalities must decay at 
least polynomially backward in time, which in turn yields the desired conclusion. 
To this end, introduce^ 

hn{xi:n) := - log L„ (a::i:„ , Yi:„) 

" (3.2) 

= -logM(a;i) +7(a;i, Yi) + 

m=2 

By assumptions (al)-(a3), limj^^oo inf ft.,i(.Ti:„) — > 00, and for any n > 1 the 

function 

xi:„ ^ h„{xi.,„) + a{x„, u) (3.3) 

attains its global minimum at 



^":n(") •= argmin^^^^ (hn{xi:n) + a{xn, u)^ , 



u e 



The Hessian matrix of the function defined in (3.3) is positive definite uniformly over 
xi:n & K" and hence the minimum is unique and X{^„(m) is the solution of 



gTSid(jln{xi;n) + a{Xn,u)j = 0. 



The Jacobian matrix of the function in the left hand side of this equation with respect to 
the vector xi-.n coincides with the aforementioned Hessian matrix and hence is invertible 



^for k > £, ^l—f. ... = is understood 
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at any u G M. Thus by the imphcit function theorem u n- X'^.^{u) is continuously 
differentiable on R. 

The usual dynamical programming argument yields the following chain rules: 



=l7(l,7,+i(x)), j<n, m=j,... 



Hence for j < n, and j < m < n, 



The following lemma is the key to a bound on the integrand in (3.5): 
Lemma 3.1. Assume (al)-(a3), then for j = l,...,n— 1, 

Pi,a(x«(x),x]Vi(x))^x;v,(x)|-x;(.) 





2 2 




< - 




K 



dx 



and 



d ~ 



dx 



< 



2?i2«(X,7(x),x)^X„"(a;) 



(3.4) 



(3.5) 



(3.6) 



(3.7) 



where 'Di2a{x, y) 



dxdy 



y{x,y), and k is as in Assumption (a3). 



Proof. Recall that the function (3.3) is convex with the Hessian, greater than k times 
identity matrix, with respect to the positive definite ordering. Hence, for any 1 < j < n 
and u,v eR, by (3.1), 

since, by definition, the minimum of hj{xi;j) + a{xj, u) over xi;j is attained at X'[.-(u). 
Further, by definition of X{.j{v), 

h,{Xi,^{v))+a{X=^{v),v) < h,{Xi.^{u))+a{X]{u),v), 

which gives 



\Xi.,{v)-Xl,,{u)\\^ < -a{Xj{v),v)+a{xj{u),v)+a{Xj{v),u)-a{xj{u),u). (3.8) 
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Plugging V := Xj[^_^{x + h) and u := X"_|_]^(.t) with a; S M and using the chain rule (3.4), 
we get 

'^\\Xlj{x + h)- XJ,(x)f < -a{X^{x + + h)) + 



a 



;l;(a;),X;Vi(a: + U)) + a{X-{x + h),X^^^{x)) - a{X- {x) , X-^^{x)) . 



Since all the functions appearing in the latter inequality arc twice continuously diffcrcn- 
tiable, dividing by and taking /i — > gives the bound (3.6). Similarly, with j := rt, 
V := x + h and u := x, (3.8) yields (3.7). □ 

By assumption (a4), 

r!' := I li^ (a(X,_i,X,)+7(X„y,)) < c\, 

is an event of full probability and hence it is enough to verify the claimed convergence 
for all e il'. Clearly, for an a; S fi', 

n 

-log/i(Xi) +7(Xi,yi) + ^ (a(Xj_i,X,) +7(Xj,yj)) < 2Cn, Vn > 7V(w), 
for an integer N{uj) < oo. Then Xp„, being a minimizer, a fortiori satisfies: 

n 

- logM(^n + 7(^r, >1) + E («(^;-i, + 7(^;, >^,)) < 2Cn, Vn > TV. (3.9) 
Hence for a large fixed constant M > AC and any n > N, 

#{j : a(x;_i,x;) + 7(x;,r,) > m} < r_ p^. 



Similarly 

#{.; : a(X;!_+M;+i) +7(lr\F,) > A/} < = + i). 



Then there is an index m G [n — 2pn, n], such that 

a(x;;,_i, 1;;) + 7(^™, i^™) < a/, and a{xi+Xx^+^) + 7(^™+\ >;«) < M, 

and, by the assumption (a3), 

- Xl\ <\X:;+^ - argmin^gR7(a;,y™)| + - argmin^gR7(a;, y^)] 



< A/-(7(^r\>;„)-7*(>^™)) + J-(7(^;;„r™)-7*(>^r„)) 



< ^/^7(x;^,y„,) + ^^^(x^+i,r„) < 
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Plugging this estimate into (3.5), we get (for j := 1) 



8M 



ds. 



Introduce 



X™(s) := sX-+' + (1 - .s)X-, 

X™(s) :=X™-i(X™(s)), j = l,...,m-l, 



and define 



V,,a{Xj^{s),Xjl,{s)) 

^x™-i(x;;'(x))| 



Tlicn from (3.6) and (3.7) (the dependence on s is now omitted for brevity) 

j 

^ bj < Cjbjbj+i, j < m - 1 

4=1 

m— 1 

^ &j < Cm-lbm-1, 



and (3.10) reads: 



1=1 



8i\/ 



< \/ / his)ds 



Lemma 3.2. For any s £ [0, 1], x > 0, and g{-) as in (a5), 



#{j < m : Cj(s) > -g{x)} 



< 



AC 
x{l - 2p)' 



(3.10) 



(3.11) 
(3.12) 

(3.13) 
(3.14) 
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Proof. The function u h- > minj;^.^^ I /i„(xi:„) + a{xn, u) ) is convex and hence 



J=2 



^min^ (h^_i{xi.,r,i-i) +a{x^_i,sX^+^ + {I - s)^"0) 



<S niin /lj„_i(xi:m_i) + a(xm-i,X 



n+V 



+ (1 - s) min ( /lrn-l(a;i:rn-l) + a(x„i-i,Xl, 



s 



+ (1 - ,s)(/i™_i(Xi"„,_i) + a{x:,_„x: 

< 2C(n + l), 

where the latter inequality follows from (3.9). Hence 

2C{n + 1) 



#{j<m:a{XJl„Xp)>x} 



< 



X 

and, since m > (1 — 2p)n, (3.14) follows from the assumption (a5). □ 

Now by Corollary A.l in the Appendix, applied to (3. 11)- (3. 12) and (3.14), for any 
(3 > 1, there is a constant Cp, such that 

bi<Cpm-^<Cp{l-2p)~'^n-^, (3.15) 

for all sufficiently large n, and thus, by (3.13), the sequence — X"], n > 1 is 

summable, which verifies the existence of the limit (1.2). □ 

Remark 3.2. When the hidden state process is a Gaussian autoregression, i.e. when 
a{x, y) = \{y — bx)"^ with a constant b ^ 0, \'Di2a{x, y)\ = b and Lemma A.l (1) implies 
exponential bound in (3.15), confirming the results deducible from the Kalman linear 
filtering theory. 



4. Concluding remarks 

As indicated by the examples of Section 2 and the partial results of Theorem 3.1, the 
convergence in (1.2) appears to be a non-trivial issue. Analogous problems have been 
discussed in the engineering literature. In fact, the MAP path estimation can be viewed 
as an optimal control problem, in which one is required to minimize the cost functional 
hnixi:n) defined in (3.2), where the term a^Xm-i, Xm) is interpreted as the cost incurred 
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by the control effort (needed to move from Xm-i to Xm) and 7(a;,„,y,„) is the cost 
payed for the deviation of the state from y,„. This setting appears in R. Bellman's book 
Bellman (1957) Ch. I, Sec. 1.7. as the "smoothing" problem and in the control literature 
is often referred to as the tracking problem. From the control theory perspective, the 
existence of the limit in (1.2) means that the optimal control and the corresponding 
optimal trajectory cease to depend on the future values of the exogenous signal Y . 

Among other related questions, the convergence (1.2) of the optimal trajectory is a 
part of the "asymptotic control theory" program, initiated by R. Kalman, R. Bellman and 
R. Bucy yet on the dawn of the modern control theory. In the linear state/quadratic cost 
(LQ) setting of R. Kalman, the control problem admits an elegant closed form solution 
for each fixed horizon n and the study of the limit (1.2) reduces to the stability analysis 
of the associated Riccati equation (a comprehensive treatment of the LQ problem can 
be found in e.g. Kwakernaak and Sivan (1972)). 

To the best of our knowledge, asymptotic analysis beyond the LQ case has been carried 
out only for a limited number of nonlinear models. Bellman and Bucy (1964) found a 
remarkable explicit solution to a quite general scalar continuous-time control problem, 
amenable to asymptotic analysis. A vector control problem with linear state dynamics 
and convex costs was studied in Bucy (1966). 

While much progress has been achieved in the optimal control theory on the infinite 
horizon (see e.g. Carlson, Haurie and Leizarowitz (1991), Zaslavski (2006)), we were not 
able to track any results, directly applicable to the question under consideration. 

Another possible connection, remaining elusive at the moment, is to the stability 
theory of nonlinear filtering equation, developed during the last decade (see e.g. the 
survey Chigansky, Liptser and Van Handel (2009)). 

Appendix A: A supporting lemma 

Lemma A.l. Consider the system of inequalities: 



where hi and ci, i = l,...,n are nonnegative real numbers and let 9 and 0' he arbitrary 
positive constants. 

1- If Ci < 6, i = 1, ...,n then 



J 




1=1 



(A.l) 



n 





for n > O^e. 



(A.2) 
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2. If for a non- decreasing nonnegative function g : i— > K+, 

9n 

#{i<n: Q > g{x)) < — , Va; > 0, 



and Cn < 0' , then for any p G (0, 1) and £ > 



for n > 



3. If only (A. 3) holds, then for any p e (0, 1) and £ > 6, 



6i < g(26ln)y^n-P^/(4'') for n > 



Proof. 



(A.3) 



(A.4) 



(A.5) 



1) The second inequality in (A.l) and c„ < 9, imply &^ < 5„0 and in turn + ... + &^ < 
9^. Fix a constant r] G (0,1) and let mi := [6*^/77]. Then at most half of bi's with 
i Cz [n — 2toi, n] are greater than and hence there is an index fci G [n — 2mi, n], such 
that 6fcj < y/rj and < ^/jy. The inequality corresponding to j :— fci in (A.l) then 

gives the bound bj + ... + bl^ < bk^bk^+iCk^ < ^9. 

Similarly, let TO2 := \9/i']\, then there is an index ^2 G [ki — 2m2 : ki], such that 
bk2 < and bk^+i < rj and, again, applying (A.l), b\ + ... + 6^^ < bk2bk2+iCk2 < 'y^^- 
This argument can be iterated at least 



n 




rjn 


2(mi V m2) 




2(9^ y 9) _ 



times and thus 



bl < 9ri 



< 



(77 



rni/(e^ve) 



(A.6) 



The best rate is obtained at r/ e ^, which yields the bound (A. 2). 
2) For a fixed ^ > 6*, by (A.3) 

9n 

#{i<n:c,> g{£)} < — rn, 

and thus at least half of c^'s with i E [n — 2rn,n] do not exceed g{i). Fix a constant 
p € (0, 1) and let rj :— n~P/^. Suppose that for all i € [n — 2rn,n], such that ct < g{i), 
either bi > rj or bi^i > 77 or both; then 

#{i G [ri — 2rn '. n] : bi > rj} > rn. 

But on the other hand, by the second inequality in (A.l) and as c„ < 9' , b^ < bn9' and 
5f + ... + < and hence 



#{i G [n — 2rn : n] : bi > ?/} < 
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This contradicts the previous estimate if n is large enough, namely, if n > {^(.9'^ /d) ^^^^ . 
Thus for such n, there is an index mi e [n ~ 2rn : n], such that 6,„i < rj, b^m+i 5: V 
Cmi < ffW- 

Now by the inequality in (A.l), corresponding to j :~ mi, 

bl + - + < Vi6mi+lC™i < ?7'5(£), (A.7) 

for which the above consideration can be repeated. Namely, by (A. 6), there are at least 
rn indices i g [mi — 2rri, mi], for which Ci < g{i). Suppose that for all of them either 
bi > jf or bi+i > or both, then 

G [mi — 2rn : mi] : bi > 77^} > rn, 

while (A.7) implies 

#{i e [mi - 2rn : mi] : b, > rf} < = nPg{i), 

V 

which is a contradiction for n large enough, i.e. for n > [£g{i) /oY^''^ ■ Hence there is 
an m2 G [mi — 2rn : mi], such that 6^2 ^ V^j ^mj+i < and < and thus by 
(A.l) 

bl + ... + bl^<V^g{£). 
This argument can be iterated for at least [l/(2r)J times, which yields the bound: 

bl<g{i){v'/'^)'^g{i)n-P^/'0. 

3) Note that b'^ := bi/g{29n), i = 1, satisfy the inequalities (A.l) with c^'s replaced 
with c'i := Ci, i = I, ■■■,n — 1 and c'„ := Cn/g{29n). By (A. 3), 

#{i<n:c,> g{2en)} < ^ = 1/2, 

i.e. all Ci's are less than g(29n) and, in particular, c„ < g{29n), that is cj^ < 1. Moreover, 
assuming that g{29n) > 1, 

9n 

#{i<n:4> gix)} <#{i<n:c,> 5(2;)} < — , Vx > 0. 
Hence by (A.4) 

7(lVg(£))^'/('-^^ 



&'i < VsW"^^'/^''^ for n > 



which in turn gives (A. 5). □ 
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Corollary A.l. Under the assumption (A. 3) with g{-) growing to +00 not faster than 
a polynomial, for any /? > 1, there is a constant Cp, such that 

hi < Cpn-P, 

for all sufficiently large n. 

Proof. Follows from (3) of Lemma A.l. □ 
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